Load Balance and Communication Tradeo s in Parallel

نویسنده

  • Peter Strazdins
چکیده

In block-partitioned parallel matrix factorization algorithms, where the matrix is distributed over a logical torus processor grid with an rs block-cyclic matrix distribution, the greatest scope for optimization exists in the formation of (block) panels. Let ! be the panel width, with ! m being an optimal value based on the characteristics a single processor's memory hierarchy. To date, two well-known techniques to do this are known as storage blocking, where the ! m ! = r = s, and algorithmic blocking (also known as`dis-tributed panels'), where ! ! m ; r = s 1. These represent strategies at opposite ends of a load balance and communication cost tradeoo In this paper, we present two new techniques for the panel formation, called pipelining with lookahead, and panel scattering. The former requires communication to be uni-directional across a processor dimension, and thus can normally only be applied to the column panel. It can be characterized by ! m ! = s, with communication and computation overlapped across processor columns, at the cost of some pipeline startup time. The latter uses ! m ! = r = s, but involves scattering the panel across its longest dimension across all processors, to be collected and broadcasted when the panel formation is complete. While it achieves perfect load balance, this method can double the communication volume. Implementation issues for these methods will be discussed. For a given target architecture, the optimum method (or combination of methods) depends on the communication to computation performance of that architecture.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Uni ed Algorithm for Load-balancing Adaptive Scienti c Simulations

Adaptive scienti c simulations require that periodic repartitioning occur dynamically throughout the course of the simulation. The computed repartitionings should minimize both the inter-processor communications incurred during the iterative mesh-based computation and the data redistribution costs required to balance the load. Recently developed schemes for computing repartitionings provide the...

متن کامل

How Architecture Evolution Influences the Scheduling Discipline used in Shared-Memory Multiprocessors

Parallel applications execute e ciently only when they distribute their workload among the available processors so that no processors are idle while there is work to do and the interactions among the processors in the form of communication or synchronization overhead is minimized Communication is every form of information exchange including message passing cache misses and non local memory acce...

متن کامل

Load{balance in parallel FACR

Fourier Analysis Cyclic Reduction is a class of very eecient methods for the solution of Poisson's equation on regular grids. We show that exploiting the numerical properties of the tridiagonal systems involved may reduce the factorization work required to a few percent of a normal factorization. We also show that exploiting this property on distributed memory parallel processor architectures m...

متن کامل

On the Competitive Analysis of Randomized Static Load Balancing

Static load balancing is attractive due to its simplicity and low communication costs. We analyze under which circumstances a randomized static load balancer can achieve good balance if the subproblem sizes are unknown and choosen by an adversary. It turns out that this worst case scenario is quite close to a more specialized model for applications related to parallel backtrack search. In both ...

متن کامل

On Runtime Parallel Scheduling

| Parallel scheduling is a new approach for load balancing. In parallel scheduling, all processors cooperate together to schedule work. Parallel scheduling is able to accurately balance the load by using global load information at compile-time or runtime. It provides a high-quality load balancing. This paper presents an overview of the parallel scheduling technique. Particular scheduling algori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007